Pesquisa | Portal Regional da BVS

1.

Risk factors affecting polygenic score performance across diverse cohorts.

Hui, Daniel; Dudek, Scott; Kiryluk, Krzysztof; Walunas, Theresa L; Kullo, Iftikhar J; Wei, Wei-Qi; Tiwari, Hemant K; Peterson, Josh F; Chung, Wendy K; Davis, Brittney; Khan, Atlas; Kottyan, Leah; Limdi, Nita A; Feng, Qiping; Puckelwartz, Megan J; Weng, Chunhua; Smith, Johanna L; Karlson, Elizabeth W; Jarvik, Gail P; Ritchie, Marylyn D.

medRxiv ; 2024 Apr 10.

Artigo em Inglês | MEDLINE | ID: mdl-38645167

RESUMO

Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed effects of covariate stratification and interaction on body mass index (BMI) PGS (PGSBMI) across four cohorts of European (N=491,111) and African (N=21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R2 differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R2 being nearly double between best and worst performing quintiles for certain covariates. 28 covariates had significant PGSBMI-covariate interaction effects, modifying PGSBMI effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R2 differences among strata and interaction effects - across all covariates, their main effects on BMI were correlated with their maximum R2 differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGSBMI individuals have highest R2 and increase in PGS effect. Using quantile regression, we show the effect of PGSBMI increases as BMI itself increases, and that these differences in effects are directly related to differences in R2 when stratifying by different covariates. Given significant and replicable evidence for context-specific PGSBMI performance and effects, we investigated ways to increase model performance taking into account non-linear effects. Machine learning models (neural networks) increased relative model R2 (mean 23%) across datasets. Finally, creating PGSBMI directly from GxAge GWAS effects increased relative R2 by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGSBMI performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.

2.

Leveraging generative AI for clinical evidence synthesis needs to ensure trustworthiness.

Zhang, Gongbo; Jin, Qiao; Jered McInerney, Denis; Chen, Yong; Wang, Fei; Cole, Curtis L; Yang, Qian; Wang, Yanshan; Malin, Bradley A; Peleg, Mor; Wallace, Byron C; Lu, Zhiyong; Weng, Chunhua; Peng, Yifan.

J Biomed Inform ; 153: 104640, 2024 Apr 10.

Artigo em Inglês | MEDLINE | ID: mdl-38608915

RESUMO

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.

3.

Assessing the Utility of Large Language Models for Phenotype-Driven Gene Prioritization in Rare Genetic Disorder Diagnosis.

Kim, Junyoung; Yang, Jingye; Wang, Kai; Weng, Chunhua; Liu, Cong.

ArXiv ; 2024 Apr 02.

Artigo em Inglês | MEDLINE | ID: mdl-38562452

RESUMO

Phenotype-driven gene prioritization is a critical process in the diagnosis of rare genetic disorders for identifying and ranking potential disease-causing genes based on observed physical traits or phenotypes. While traditional approaches rely on curated knowledge graphs with phenotype-gene relations, recent advancements in large language models have opened doors to the potential of AI predictions through extensive training on diverse corpora and complex models. This study conducted a comprehensive evaluation of five large language models, including two Generative Pre-trained Transformers series, and three Llama2 series, assessing their performance across three key metrics: task completeness, gene prediction accuracy, and adherence to required output structures. Various experiments explored combinations of models, prompts, input types, and task difficulty levels. Our findings reveal that even the best-performing LLM, GPT-4, achieved an accuracy of 16.0%, which still lags behind traditional bioinformatics tools. Prediction accuracy increased with the parameter/model size. A similar increasing trend was observed for the task completion rate, with complicated prompts more likely to increase task completeness in models smaller than GPT-4. However, complicated prompts are more likely to decrease the structure compliance rate, but no prompt effects on GPT-4. Compared to HPO term-based input, LLM was also able to achieve better than random prediction accuracy by taking free-text input, but slightly lower than with the HPO input. Bias analysis showed that certain genes, such as MECP2, CDKL5, and SCN1A, are more likely to be top-ranked, potentially explaining the variances observed across different datasets. This study provides valuable insights into the integration of LLMs within genomic analysis, contributing to the ongoing discussion on the utilization of advanced LLMs in clinical workflows.

4.

A span-based model for extracting overlapping PICO entities from randomized controlled trial publications.

Zhang, Gongbo; Zhou, Yiliang; Hu, Yan; Xu, Hua; Weng, Chunhua; Peng, Yifan.

J Am Med Inform Assoc ; 31(5): 1163-1171, 2024 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-38471120

RESUMO

OBJECTIVES: Extracting PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method, PICOX, to extract overlapping PICO entities. MATERIALS AND METHODS: PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then, it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using 1 of the best-performing baselines, EBM-NLP, and 3 more datasets, ie, PICO-Corpus and randomized controlled trial publications on Alzheimer's Disease (AD) or COVID-19, using entity-level precision, recall, and F1 scores. RESULTS: PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (P âª.01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline. CONCLUSION: PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision.

Assuntos

Doença de Alzheimer , COVID-19 , Humanos , Processamento de Linguagem Natural

5.

Sociotechnical feasibility of natural language processing-driven tools in clinical trial eligibility prescreening for Alzheimer's disease and related dementias.

Idnay, Betina; Liu, Jianfang; Fang, Yilu; Hernandez, Alex; Kaw, Shivani; Etwaru, Alicia; Juarez Padilla, Janeth; Ramírez, Sergio Ozoria; Marder, Karen; Weng, Chunhua; Schnall, Rebecca.

J Am Med Inform Assoc ; 31(5): 1062-1073, 2024 Apr 19.

Artigo em Inglês | MEDLINE | ID: mdl-38447587

RESUMO

BACKGROUND: Alzheimer's disease and related dementias (ADRD) affect over 55 million globally. Current clinical trials suffer from low recruitment rates, a challenge potentially addressable via natural language processing (NLP) technologies for researchers to effectively identify eligible clinical trial participants. OBJECTIVE: This study investigates the sociotechnical feasibility of NLP-driven tools for ADRD research prescreening and analyzes the tools' cognitive complexity's effect on usability to identify cognitive support strategies. METHODS: A randomized experiment was conducted with 60 clinical research staff using three prescreening tools (Criteria2Query, Informatics for Integrating Biology and the Bedside [i2b2], and Leaf). Cognitive task analysis was employed to analyze the usability of each tool using the Health Information Technology Usability Evaluation Scale. Data analysis involved calculating descriptive statistics, interrater agreement via intraclass correlation coefficient, cognitive complexity, and Generalized Estimating Equations models. RESULTS: Leaf scored highest for usability followed by Criteria2Query and i2b2. Cognitive complexity was found to be affected by age, computer literacy, and number of criteria, but was not significantly associated with usability. DISCUSSION: Adopting NLP for ADRD prescreening demands careful task delegation, comprehensive training, precise translation of eligibility criteria, and increased research accessibility. The study highlights the relevance of these factors in enhancing NLP-driven tools' usability and efficacy in clinical research prescreening. CONCLUSION: User-modifiable NLP-driven prescreening tools were favorably received, with system type, evaluation sequence, and user's computer literacy influencing usability more than cognitive complexity. The study emphasizes NLP's potential in improving recruitment for clinical trials, endorsing a mixed-methods approach for future system evaluation and enhancements.

Assuntos

Doença de Alzheimer , Informática Médica , Humanos , Processamento de Linguagem Natural , Estudos de Viabilidade , Definição da Elegibilidade

6.

Retrieval augmented scientific claim verification.

Liu, Hao; Soroush, Ali; Nestor, Jordan G; Park, Elizabeth; Idnay, Betina; Fang, Yilu; Pan, Jane; Liao, Stan; Bernard, Marguerite; Peng, Yifan; Weng, Chunhua.

JAMIA Open ; 7(1): ooae021, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38455840

RESUMO

Objective: To automate scientific claim verification using PubMed abstracts. Materials and Methods: We developed CliVER, an end-to-end scientific Claim VERification system that leverages retrieval-augmented techniques to automatically retrieve relevant clinical trial abstracts, extract pertinent sentences, and use the PICO framework to support or refute a scientific claim. We also created an ensemble of three state-of-the-art deep learning models to classify rationale of support, refute, and neutral. We then constructed CoVERt, a new COVID VERification dataset comprising 15 PICO-encoded drug claims accompanied by 96 manually selected and labeled clinical trial abstracts that either support or refute each claim. We used CoVERt and SciFact (a public scientific claim verification dataset) to assess CliVER's performance in predicting labels. Finally, we compared CliVER to clinicians in the verification of 19 claims from 6 disease domains, using 189 648 PubMed abstracts extracted from January 2010 to October 2021. Results: In the evaluation of label prediction accuracy on CoVERt, CliVER achieved a notable F1 score of 0.92, highlighting the efficacy of the retrieval-augmented models. The ensemble model outperforms each individual state-of-the-art model by an absolute increase from 3% to 11% in the F1 score. Moreover, when compared with four clinicians, CliVER achieved a precision of 79.0% for abstract retrieval, 67.4% for sentence selection, and 63.2% for label prediction, respectively. Conclusion: CliVER demonstrates its early potential to automate scientific claim verification using retrieval-augmented strategies to harness the wealth of clinical trial abstracts in PubMed. Future studies are warranted to further test its clinical utility.

7.

A Survey of Clinicians' Views of the Utility of Large Language Models.

Spotnitz, Matthew; Idnay, Betina; Gordon, Emily R; Shyu, Rebecca; Zhang, Gongbo; Liu, Cong; Cimino, James J; Weng, Chunhua.

Appl Clin Inform ; 15(2): 306-312, 2024 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-38442909

RESUMO

OBJECTIVES: Large language models (LLMs) like Generative pre-trained transformer (ChatGPT) are powerful algorithms that have been shown to produce human-like text from input data. Several potential clinical applications of this technology have been proposed and evaluated by biomedical informatics experts. However, few have surveyed health care providers for their opinions about whether the technology is fit for use. METHODS: We distributed a validated mixed-methods survey to gauge practicing clinicians' comfort with LLMs for a breadth of tasks in clinical practice, research, and education, which were selected from the literature. RESULTS: A total of 30 clinicians fully completed the survey. Of the 23 tasks, 16 were rated positively by more than 50% of the respondents. Based on our qualitative analysis, health care providers considered LLMs to have excellent synthesis skills and efficiency. However, our respondents had concerns that LLMs could generate false information and propagate training data bias.Our survey respondents were most comfortable with scenarios that allow LLMs to function in an assistive role, like a physician extender or trainee. CONCLUSION: In a mixed-methods survey of clinicians about LLM use, health care providers were encouraging of having LLMs in health care for many tasks, and especially in assistive roles. There is a need for continued human-centered development of both LLMs and artificial intelligence in general.

Assuntos

Algoritmos , Inteligência Artificial , Humanos , Instalações de Saúde , Pessoal de Saúde , Idioma

8.

Natural language processing to identify lupus nephritis phenotype in electronic health records.

Deng, Yu; Pacheco, Jennifer A; Ghosh, Anika; Chung, Anh; Mao, Chengsheng; Smith, Joshua C; Zhao, Juan; Wei, Wei-Qi; Barnado, April; Dorn, Chad; Weng, Chunhua; Liu, Cong; Cordon, Adam; Yu, Jingzhi; Tedla, Yacob; Kho, Abel; Ramsey-Goldman, Rosalind; Walunas, Theresa; Luo, Yuan.

BMC Med Inform Decis Mak ; 22(Suppl 2): 348, 2024 Mar 03.

Artigo em Inglês | MEDLINE | ID: mdl-38433189

RESUMO

BACKGROUND: Systemic lupus erythematosus (SLE) is a rare autoimmune disorder characterized by an unpredictable course of flares and remission with diverse manifestations. Lupus nephritis, one of the major disease manifestations of SLE for organ damage and mortality, is a key component of lupus classification criteria. Accurately identifying lupus nephritis in electronic health records (EHRs) would therefore benefit large cohort observational studies and clinical trials where characterization of the patient population is critical for recruitment, study design, and analysis. Lupus nephritis can be recognized through procedure codes and structured data, such as laboratory tests. However, other critical information documenting lupus nephritis, such as histologic reports from kidney biopsies and prior medical history narratives, require sophisticated text processing to mine information from pathology reports and clinical notes. In this study, we developed algorithms to identify lupus nephritis with and without natural language processing (NLP) using EHR data from the Northwestern Medicine Enterprise Data Warehouse (NMEDW). METHODS: We developed five algorithms: a rule-based algorithm using only structured data (baseline algorithm) and four algorithms using different NLP models. The first NLP model applied simple regular expression for keywords search combined with structured data. The other three NLP models were based on regularized logistic regression and used different sets of features including positive mention of concept unique identifiers (CUIs), number of appearances of CUIs, and a mixture of three components (i.e. a curated list of CUIs, regular expression concepts, structured data) respectively. The baseline algorithm and the best performing NLP algorithm were externally validated on a dataset from Vanderbilt University Medical Center (VUMC). RESULTS: Our best performing NLP model incorporated features from both structured data, regular expression concepts, and mapped concept unique identifiers (CUIs) and showed improved F measure in both the NMEDW (0.41 vs 0.79) and VUMC (0.52 vs 0.93) datasets compared to the baseline lupus nephritis algorithm. CONCLUSION: Our NLP MetaMap mixed model improved the F-measure greatly compared to the structured data only algorithm in both internal and external validation datasets. The NLP algorithms can serve as powerful tools to accurately identify lupus nephritis phenotype in EHR for clinical research and better targeted therapies.

Assuntos

Lúpus Eritematoso Sistêmico , Nefrite Lúpica , Humanos , Nefrite Lúpica/diagnóstico , Registros Eletrônicos de Saúde , Processamento de Linguagem Natural , Fenótipo , Doenças Raras

9.

Return of polygenic risk scores in research: Stakeholders' views on the eMERGE-IV study.

Sabatello, Maya; Bakken, Suzanne; Chung, Wendy K; Cohn, Elizabeth; Crew, Katherine D; Kiryluk, Krzysztof; Kukafka, Rita; Weng, Chunhua; Appelbaum, Paul S.

HGG Adv ; 5(2): 100281, 2024 Apr 11.

Artigo em Inglês | MEDLINE | ID: mdl-38414240

RESUMO

Research on polygenic risk scores (PRSs) for common, genetically complex chronic diseases aims to improve health-related predictions, tailor risk-reducing interventions, and improve health outcomes. Yet, the study and use of PRSs in clinical settings raise equity, clinical, and regulatory challenges that can be greater for individuals from historically marginalized racial, ethnic, and other minoritized communities. As part of the National Human Genome Research Institute-funded Electronic Medical Records and Genomics IV Network, we conducted online focus groups with patients/community members, clinicians, and members of institutional review boards to explore their views on key issues, including PRS research, return of PRS results, clinical translation, and barriers and facilitators to health behavioral changes in response to PRS results. Across stakeholder groups, our findings indicate support for PRS development and a strong interest in having PRS results returned to research participants. However, we also found multi-level barriers and significant differences in stakeholders' views about what is needed and possible for successful implementation. These include researcher-participant interaction formats, health and genomic literacy, and a range of structural barriers, such as financial instability, insurance coverage, and the absence of health-supporting infrastructure and affordable healthy food options in poorer neighborhoods. Our findings highlight the need to revisit and implement measures in PRS studies (e.g., incentives and resources for follow-up care), as well as system-level policies to promote equity in genomic research and health outcomes.

Assuntos

Registros Eletrônicos de Saúde , 60488 , Humanos , Grupos Focais

10.

Ethical Considerations for Artificial Intelligence in Dermatology: A Scoping Review.

Gordon, Emily R; Trager, Megan H; Kontos, Despina; Weng, Chunhua; Geskin, Larisa J; Dugdale, Lydia S; Samie, Faramarz H.

Br J Dermatol ; 2024 Feb 08.

Artigo em Inglês | MEDLINE | ID: mdl-38330217

RESUMO

The field of dermatology is experiencing the rapid deployment of artificial intelligence (AI), from mobile applications for skin cancer detection to large language models like ChatGPT that can answer generalist or specialist questions about skin diagnoses. With these new applications, ethical concerns have emerged. In this scoping review, we aim to identify the applications of AI to the field of dermatology and to understand their ethical implications. We utilized a multifaceted search approach, searching PubMed, Medline, Cochrane, and Google Scholar for primary literature according to the Preferred Reporting Items for Systematic Reviews and Meta-analysis (PRISMA) Extension for Scoping Reviews. Our advanced query included terms related to dermatology, artificial intelligence, and ethical considerations. Our search yielded a total of 202 papers. After initial screening, 68 studies were included. Thirty-two related to clinical image analysis and raised ethical concerns for misdiagnosis, data security, violations of privacy, and replacement of dermatologist jobs. Seventeen discussed limited skin of color representation in datasets leading to potential misdiagnosis in the general population. Nine articles about teledermatology raised ethical concerns, including the exacerbation of health disparities, lack of standardized regulations, informed consent for AI use, and privacy challenges. Seven addressed inaccuracies of responses of large language models. Seven examined attitudes and trust towards AI, with most patients requesting supplemental assessment by a physician to ensure reliability and accountability. Benefits of artificial intelligence integration into clinical practice include increased patient access, improved clinical decision making, efficiency, and many others. However, safeguards must be implemented to ensure ethical applications of artificial intelligence.

11.

Enhancing phenotype recognition in clinical notes using large language models: PhenoBCBERT and PhenoGPT.

Yang, Jingye; Liu, Cong; Deng, Wendy; Wu, Da; Weng, Chunhua; Zhou, Yunyun; Wang, Kai.

Patterns (N Y) ; 5(1): 100887, 2024 Jan 12.

Artigo em Inglês | MEDLINE | ID: mdl-38264716

RESUMO

To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models-PhenoBCBERT and PhenoGPT-for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models to automate the detection of phenotype terms, including those not in the current HPO. We compare these models with PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also show strong performance in case studies on biomedical literature. We evaluate the strengths and weaknesses of BERT- and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.

12.

Fine-tuning Large Language Models for Rare Disease Concept Normalization.

Wang, Andy; Liu, Cong; Yang, Jingye; Weng, Chunhua.

bioRxiv ; 2024 Apr 14.

Artigo em Inglês | MEDLINE | ID: mdl-38234802

RESUMO

Objective: We aim to develop a novel method for rare disease concept normalization by fine-tuning Llama 2, an open-source large language model (LLM), using a domain-specific corpus sourced from the Human Phenotype Ontology (HPO). Methods: We developed an in-house template-based script to generate two corpora for fine-tuning. The first (NAME) contains standardized HPO names, sourced from the HPO vocabularies, along with their corresponding identifiers. The second (NAME+SYN) includes HPO names and half of the concept's synonyms as well as identifiers. Subsequently, we fine-tuned Llama2 (Llama2-7B) for each sentence set and conducted an evaluation using a range of sentence prompts and various phenotype terms. Results: When the phenotype terms for normalization were included in the fine-tuning corpora, both models demonstrated nearly perfect performance, averaging over 99% accuracy. In comparison, ChatGPT-3.5 has only ~20% accuracy in identifying HPO IDs for phenotype terms. When single-character typos were introduced in the phenotype terms, the accuracy of NAME and NAME+SYN is 10.2% and 36.1%, respectively, but increases to 61.8% (NAME+SYN) with additional typo-specific fine-tuning. For terms sourced from HPO vocabularies as unseen synonyms, the NAME model achieved 11.2% accuracy, while the NAME+SYN model achieved 92.7% accuracy. Conclusion: Our fine-tuned models demonstrate ability to normalize phenotype terms unseen in the fine-tuning corpus, including misspellings, synonyms, terms from other ontologies, and laymen's terms. Our approach provides a solution for the use of LLM to identify named medical entities from the clinical narratives, while successfully normalizing them to standard concepts in a controlled vocabulary.

13.

Inborn Errors of Immunity Contribute to the Burden of Skin Disease and Create Opportunities for Improving the Practice of Dermatology.

Colvin, Annelise; Youssef, Soundos; Noh, Heeju; Wright, Julia; Jumonville, Ghislaine; LaRow Brown, Kathleen; Tatonetti, Nicholas P; Milner, Joshua D; Weng, Chunhua; Bordone, Lindsey A; Petukhova, Lynn.

J Invest Dermatol ; 144(2): 307-315.e1, 2024 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-37716649

RESUMO

Opportunities to improve the clinical management of skin disease are being created by advances in genomic medicine. Large-scale sequencing increasingly challenges notions about single-gene disorders. It is now apparent that monogenic etiologies make appreciable contributions to the population burden of disease and that they are underrecognized in clinical practice. A genetic diagnosis informs on molecular pathology and may direct targeted treatments and tailored prevention strategies for patients and family members. It also generates knowledge about disease pathogenesis and management that is relevant to patients without rare pathogenic variants. Inborn errors of immunity are a large class of monogenic etiologies that have been well-studied and contribute to the population burden of inflammatory diseases. To further delineate the contributions of inborn errors of immunity to the pathogenesis of skin disease, we performed a set of analyses that identified 316 inborn errors of immunity associated with skin pathologies, including common skin diseases. These data suggest that clinical sequencing is underutilized in dermatology. We next use these data to derive a network that illuminates the molecular relationships of these disorders and suggests an underlying etiological organization to immune-mediated skin disease. Our results motivate the further development of a molecularly derived and data-driven reorganization of clinical diagnoses of skin disease.

Assuntos

Dermatologia , Dermatopatias , Humanos , Dermatopatias/genética , Dermatopatias/terapia , Pele , Patologia Molecular

14.

Participant-guided development of bilingual genomic educational infographics for Electronic Medical Records and Genomics Phase IV study.

Casillan, Aimiel; Florido, Michelle E; Galarza-Cornejo, Jamie; Bakken, Suzanne; Lynch, John A; Chung, Wendy K; Mittendorf, Kathleen F; Berner, Eta S; Connolly, John J; Weng, Chunhua; Holm, Ingrid A; Khan, Atlas; Kiryluk, Krzysztof; Limdi, Nita A; Petukhova, Lynn; Sabatello, Maya; Wynn, Julia.

J Am Med Inform Assoc ; 31(2): 306-316, 2024 Jan 18.

Artigo em Inglês | MEDLINE | ID: mdl-37860921

RESUMO

OBJECTIVE: Developing targeted, culturally competent educational materials is critical for participant understanding of engagement in a large genomic study that uses computational pipelines to produce genome-informed risk assessments. MATERIALS AND METHODS: Guided by the Smerecnik framework that theorizes understanding of multifactorial genetic disease through 3 knowledge types, we developed English and Spanish infographics for individuals enrolled in the Electronic Medical Records and Genomics Network. Infographics were developed to explain concepts in lay language and visualizations. We conducted iterative sessions using a modified "think-aloud" process with 10 participants (6 English, 4 Spanish-speaking) to explore comprehension of and attitudes towards the infographics. RESULTS: We found that all but one participant had "awareness knowledge" of genetic disease risk factors upon viewing the infographics. Many participants had difficulty with "how-to" knowledge of applying genetic risk factors to specific monogenic and polygenic risks. Participant attitudes towards the iteratively-refined infographics indicated that design saturation was reached. DISCUSSION: There were several elements that contributed to the participants' comprehension (or misunderstanding) of the infographics. Visualization and iconography techniques best resonated with those who could draw on prior experiences or knowledge and were absent in those without. Limited graphicacy interfered with the understanding of absolute and relative risks when presented in graph format. Notably, narrative and storytelling theory that informed the creation of a vignette infographic was most accessible to all participants. CONCLUSION: Engagement with the intended audience who can identify strengths and points for improvement of the intervention is necessary to the development of effective infographics.

Assuntos

Visualização de Dados , Registros Eletrônicos de Saúde , Humanos , Comunicação , Genômica , Educação em Saúde/métodos

15.

Applying unsupervised machine learning approaches to characterize autologous breast reconstruction patient subgroups: an NSQIP analysis of 14,274 patients.

Kim, Dylan K; Corpuz, George S; Ta, Casey N; Weng, Chunhua; Rohde, Christine H.

J Plast Reconstr Aesthet Surg ; 88: 330-339, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38061257

RESUMO

BACKGROUND: Autologous breast reconstruction is composed of diverse techniques and results in a variety of outcome trajectories. We propose employing an unsupervised machine learning method to characterize such heterogeneous patterns in large-scale datasets. METHODS: A retrospective cohort study of autologous breast reconstruction patients was conducted through the National Surgical Quality Improvement Program database. Patient characteristics, intraoperative variables, and occurrences of acute postoperative complications were collected. The cohort was classified into patient subgroups via the K-means clustering algorithm, a similarity-based unsupervised learning approach. The characteristics of each cluster were compared for differences from the complementary sample (p < 2 ×10-4) and validated with a test set. RESULTS: A total of 14,274 female patients were included in the final study cohort. Clustering identified seven optimal subgroups, ordered by increasing rate of postoperative complication. Cluster 1 (2027 patients) featured breast reconstruction with free flaps (50%) and latissimus dorsi flaps (40%). In addition to its low rate of complications (14%, p < 2 ×10-4), its patient population was younger and with lower comorbidities when compared with the whole cohort. In the other extreme, cluster 7 (1112 patients) almost exclusively featured breast reconstruction with free flaps (94%) and possessed the highest rates of unplanned reoperations, readmissions, and dehiscence (p < 2 ×10-4). The reoperation profile of cluster 3 was also significantly different from the general cohort and featured lower proportions of vascular repair procedures (p < 8 ×10-4). CONCLUSIONS: This study presents a novel, generalizable application of an unsupervised learning model to organize patient subgroups with associations between comorbidities, modality of breast reconstruction, and postoperative outcomes.

Assuntos

Neoplasias da Mama , Retalhos de Tecido Biológico , Mamoplastia , Humanos , Feminino , Aprendizado de Máquina não Supervisionado , Estudos Retrospectivos , Mamoplastia/métodos , Complicações Pós-Operatórias/etiologia , Retalhos de Tecido Biológico/cirurgia , Neoplasias da Mama/complicações

16.

Phenotype-Driven Molecular Genetic Test Recommendation for Diagnosing Pediatric Rare Disorders.

Chen, Fangyi; Ahimaz, Priyanka; Wang, Kai; Chung, Wendy K; Ta, Casey; Weng, Chunhua; Liu, Cong.

Res Sq ; 2023 Nov 22.

Artigo em Inglês | MEDLINE | ID: mdl-38045411

RESUMO

Rare disease patients often endure prolonged diagnostic odysseys and may still remain undiagnosed for years. Selecting the appropriate genetic tests is crucial to lead to timely diagnosis. Phenotypic features offer great potential for aiding genomic diagnosis in rare disease cases. We see great promise in effective integration of phenotypic information into genetic test selection workflow. In this study, we present a phenotype-driven molecular genetic test recommendation (Phen2Test) for pediatric rare disease diagnosis. Phen2Test was constructed using frequency matrix of phecodes and demographic data from the EHR before ordering genetic tests, with the objective to streamline the selection of molecular genetic tests (whole-exome / whole-genome sequencing, or gene panels) for clinicians with minimum genetic training expertise. We developed and evaluated binary classifiers based on 1,005 individuals referred to genetic counselors for potential genetic evaluation. In the evaluation using the gold standard cohort, the model achieved strong performance with an AUROC of 0.82 and an AUPRC of 0.92. Furthermore, we tested the model on another silver standard cohort (n=6,458), achieving an overall AUROC of 0.72 and an AUPRC of 0.671. Phen2Test was adjusted to align with current clinical guidelines, showing superior performance with more recent data, demonstrating its potential for use within a learning healthcare system as a genomic medicine intervention that adapts to guideline updates. This study showcases the practical utility of phenotypic features in recommending molecular genetic tests with performance comparable to clinical geneticists. Phen2Test could assist clinicians with limited genetic training and knowledge to order appropriate genetic tests.

17.

Polygenic risk alters the penetrance of monogenic kidney disease.

Khan, Atlas; Shang, Ning; Nestor, Jordan G; Weng, Chunhua; Hripcsak, George; Harris, Peter C; Gharavi, Ali G; Kiryluk, Krzysztof.

Nat Commun ; 14(1): 8318, 2023 Dec 14.

Artigo em Inglês | MEDLINE | ID: mdl-38097619

RESUMO

Chronic kidney disease (CKD) is determined by an interplay of monogenic, polygenic, and environmental risks. Autosomal dominant polycystic kidney disease (ADPKD) and COL4A-associated nephropathy (COL4A-AN) represent the most common forms of monogenic kidney diseases. These disorders have incomplete penetrance and variable expressivity, and we hypothesize that polygenic factors explain some of this variability. By combining SNP array, exome/genome sequence, and electronic health record data from the UK Biobank and All-of-Us cohorts, we demonstrate that the genome-wide polygenic score (GPS) significantly predicts CKD among ADPKD monogenic variant carriers. Compared to the middle tertile of the GPS for noncarriers, ADPKD variant carriers in the top tertile have a 54-fold increased risk of CKD, while ADPKD variant carriers in the bottom tertile have only a 3-fold increased risk of CKD. Similarly, the GPS significantly predicts CKD in COL4A-AN carriers. The carriers in the top tertile of the GPS have a 2.5-fold higher risk of CKD, while the risk for carriers in the bottom tertile is not different from the average population risk. These results suggest that accounting for polygenic risk improves risk stratification in monogenic kidney disease.

Assuntos

Rim Policístico Autossômico Dominante , Insuficiência Renal Crônica , Humanos , Penetrância , Insuficiência Renal Crônica/genética , Insuficiência Renal Crônica/complicações , Herança Multifatorial/genética , Fatores de Risco

18.

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT.

Yang, Jingye; Liu, Cong; Deng, Wendy; Wu, Da; Weng, Chunhua; Zhou, Yunyun; Wang, Kai.

ArXiv ; 2023 Nov 09.

Artigo em Inglês | MEDLINE | ID: mdl-37986722

RESUMO

To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models - PhenoBCBERT and PhenoGPT - for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes, due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models (LLMs) to automate the detection of phenotype terms, including those not in the current HPO. We compared these models to PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also showed strong performance in case studies on biomedical literature. We evaluated the strengths and weaknesses of BERT-based and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.

19.

Strong protective effect of the APOL1 p.N264K variant against G2-associated focal segmental glomerulosclerosis and kidney disease.

Gupta, Yask; Friedman, David J; McNulty, Michelle T; Khan, Atlas; Lane, Brandon; Wang, Chen; Ke, Juntao; Jin, Gina; Wooden, Benjamin; Knob, Andrea L; Lim, Tze Y; Appel, Gerald B; Huggins, Kinsie; Liu, Lili; Mitrotti, Adele; Stangl, Megan C; Bomback, Andrew; Westland, Rik; Bodria, Monica; Marasa, Maddalena; Shang, Ning; Cohen, David J; Crew, Russell J; Morello, William; Canetta, Pietro; Radhakrishnan, Jai; Martino, Jeremiah; Liu, Qingxue; Chung, Wendy K; Espinoza, Angelica; Luo, Yuan; Wei, Wei-Qi; Feng, Qiping; Weng, Chunhua; Fang, Yilu; Kullo, Iftikhar J; Naderian, Mohammadreza; Limdi, Nita; Irvin, Marguerite R; Tiwari, Hemant; Mohan, Sumit; Rao, Maya; Dube, Geoffrey K; Chaudhary, Ninad S; Gutiérrez, Orlando M; Judd, Suzanne E; Cushman, Mary; Lange, Leslie A; Lange, Ethan M; Bivona, Daniel L.

Nat Commun ; 14(1): 7836, 2023 Nov 30.

Artigo em Inglês | MEDLINE | ID: mdl-38036523

RESUMO

African Americans have a significantly higher risk of developing chronic kidney disease, especially focal segmental glomerulosclerosis -, than European Americans. Two coding variants (G1 and G2) in the APOL1 gene play a major role in this disparity. While 13% of African Americans carry the high-risk recessive genotypes, only a fraction of these individuals develops FSGS or kidney failure, indicating the involvement of additional disease modifiers. Here, we show that the presence of the APOL1 p.N264K missense variant, when co-inherited with the G2 APOL1 risk allele, substantially reduces the penetrance of the G1G2 and G2G2 high-risk genotypes by rendering these genotypes low-risk. These results align with prior functional evidence showing that the p.N264K variant reduces the toxicity of the APOL1 high-risk alleles. These findings have important implications for our understanding of the mechanisms of APOL1-associated nephropathy, as well as for the clinical management of individuals with high-risk genotypes that include the G2 allele.

Assuntos

Glomerulosclerose Segmentar e Focal , Humanos , Glomerulosclerose Segmentar e Focal/genética , Apolipoproteína L1/genética , Predisposição Genética para Doença , Fatores de Risco , Genótipo , Apolipoproteínas/genética

20.

Clusters, crop dusters, and myth busters: a scoping review of environmental exposures and cutaneous T-cell lymphoma.

Gordon, Emily R; Adeuyan, Oluwaseyi; Schreidah, Celine M; Chen, Caroline; Trager, Megan H; Lapolla, Brigit A; Fahmy, Lauren M; Weng, Chunhua; Geskin, Larisa J.

Ital J Dermatol Venerol ; 158(6): 467-482, 2023 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-38015484

RESUMO

INTRODUCTION: Cutaneous T-cell lymphoma (CTCL) is a heterogenous group of non-Hodgkin lymphomas. Similar presentation to benign conditions, significant genetic variation, and lack of definitive biomarkers contributes to diagnostic delay. The etiology of CTCL is unknown, and environmental exposures, such as geographic, occupational, chemicals, sunlight, and insects have been investigated. EVIDENCE ACQUISITION: Review of the literature for CTCL and exposures was performed in PubMed and Google Scholar in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA) Extension for Scoping Reviews. This search yielded 193 total results, which were initially screened with defined inclusion and exclusion criteria. The 45 remaining articles were reviewed and classified by exposure type. EVIDENCE SYNTHESIS: The most frequently investigated CTCL exposure type was geographic (13/45 articles, 29%). Chemical exposures were commonly discussed (10/45 articles, 22%), along with occupational (10/45 articles, 22%). Insect exposures (6/45, 13%) and sun exposure (3/45, 7%) were also reviewed, along with articles describing multiple exposure types (3/45, 7%). Article types ranged from cases to systematic reviews and case-control studies. Evidence linking CTCL and these exposures was mixed. Limitations of this investigation include reliance on patient reporting and frequent speculation on disease association versus causality. CONCLUSIONS: This investigation synthesizes the current literature on exposures potentially implicated in the pathogenesis of CTCL, while offering guidance on patient history-taking to ensure potential exposures are captured. Awareness of these possible associations may improve understanding of disease pathogenesis and diagnosis. Moreover, these insights may help with public health decision-making and disease mitigation.

Assuntos

Linfoma não Hodgkin , Linfoma Cutâneo de Células T , Neoplasias Cutâneas , Humanos , Diagnóstico Tardio , Linfoma Cutâneo de Células T/epidemiologia , Linfoma Cutâneo de Células T/etiologia , Exposição Ambiental/efeitos adversos , Neoplasias Cutâneas/epidemiologia , Neoplasias Cutâneas/etiologia

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA